A comparative study in automatic recognition of broadcast audio

نویسندگان

  • Stavros Ntalampiras
  • Nikos Fakotakis
چکیده

This paper provides a thorough description of a methodology which leads to high accuracy as regards automatic analysis of broadcast audio. The main objective is to find a feature set for efficient speech/music discrimination while keeping the number of its dimensions as small as possible. Three groups of parameters based on Mel-scale filterbank, MPEG-7 standard and wavelet decomposition are examined in detail. We annotated on-line radio recordings characterized by great diversity, for building probabilistic models and testing four frameworks. The proposed approach utilizes wavelets and MPEG-7 ASP descriptor for modeling speech and music respectively, and results to 98.5 % average recognition rate.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On-line human activity recognition from audio and home automation sensors: Comparison of sequential and non-sequential models in realistic Smart Homes

Automatic human Activity Recognition (AR) is an important process for the provision of context-aware services in smart spaces such as voice-controlled smart homes. In this paper, we present an on-line Activities of Daily Living (ADL) recognition method for automatic identification within homes in which multiple sensors, actuators and automation equipment coexist, including audio sensors. Three ...

متن کامل

Combining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)

Commercial quadcopters with many private, commercial, and public sector applications are a rapidly advancing technology. Currently, there is no guarantee to facilitate the safe operation of these devices in the community. Three different automatic commercial quadcopters identification methods are presented in this paper. Among these three techniques, two are based on deep neural networks in whi...

متن کامل

بازشناسی خودکار حالت عاطفی مبتنی بر تغییرات فیزیولوژیک

Recently, automatic affective state recognition has been noteworthy for improving Human Computer Interaction (HCI), clinical researches and other various applications. Little attention has been paid so far to physiological signals for affective state recognition compared to audio-visual methods. Different affective states stimulate the Autonomic Nervous System (ANS) and lead to changes in physi...

متن کامل

A Stream-based Audio Segmentation, C Pre-processing System for Broadcast

This paper describes our work on the development of a low latency stream-based audio pre-processing system for broadcast news using model-based techniques. It performs speech/nonspeech classification, speaker segmentation, speaker clustering, gender and background conditions classification. As a way to increase the modelling accuracy our algorithms make extensive use of Artificial Neural Networ...

متن کامل

Story Segmentation and Detection of Commercials in Broadcast News Video

The Informedia Digital Library Project [Wactlar96] allows full content indexing and retrieval of text, audio and video material. Segmentation is an integral process in the Informedia digital video library. The success of the Informedia project hinges on two critical assumptions: that we can extract sufficiently accurate speech recognition transcripts from the broadcast audio and that we can seg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008